perm filename MEMO[7,ALS] blob
sn#030925 filedate 1973-03-28 generic text, type T, neo UTF8
00010 This report discribes an extention and modification to the
00020 signature table method, as discribed by Samuel[1] to adapt it for use
00030 in speech recognition.
00040 While the original scheme can be used in this more demanding
00050 situation with a moderate degree of success, it seemed desirable
00060 to adopt a somewhat more rigorous procedure for at least two compelling
00070 reasons. In the first place the input parameters, for the case of speech
00080 appear to be less independent of each other than for the
00090 game of checkers. The use of separate tables to obtain correlation
00100 coefficients for subsets of the input parameters and the
00110 subsequent combination of different subsets by higher level tables
00120 is therefore open to question. A second difficulty has to do with the
00130 multicategory classification that is involved in the case of speech
00140 where it is essential for the outputs of higher tables to have a
00150 clear interpretation, in some absolute sense, so that they may be
00160 compared, one with the other,- something that was not required in the
00170 earlier situation where there was but one final output. In the multi-
00180 category case the outputs correlation coefficients or at least factors
00190 from which such coefficients can be computed rather than being
00200 correlations of the lower level correlation coefficients.
00210 The object of the present report is to:
00220 a) Investigate the conditions under which the decomposition of a
00230 multidimensional space into two or more levels is valid, and
00240 b) Devise a system of signature tables in which the outputs represent
00250 true probability densities for each training class and in which the
00260 values stored in these tables can be sequentially accumulated during
00270 a training phase.
00280
00290 For those not familiar with the use of signature tables as
00300 used by Samuel in programs which played the game of checkers, the
00310 concept is best illustrated (Fig.1) by an arrangement of tables
00320 used in the program. There are 27 input terms. Each term evaluates
00330 a specific aspect of a board situation and it is quantized into
00340 a limited but adequate range of values, 7,5,and 3, in this case.
00350 The terms are divided into 9 sets with 3 terms each, forming the 9
00360 first level tables. Outputs from the first level tables are quantized
00370 to 5 levels and combined into 3 second level tables and, finally,
00380 into one third-level table whose output represents the figure of
00390 merit of the board in question.
00400 A signature table has an entry for every possible combination
00410 of the input vector. Thus there are 7*5*3 or 105 entries in each of
00420 the first level tables. Training consists of accumulating two counts
00430 for each entry during a training sequence. Count A is incremented
00440 when the current input vector represents a prefered move and count D
00450 is incremented when it is not the prefered move. The output from the
00460 table is computed as a correlation coeficient
00470 C=(A-D)/(A+D)
00480 The figure of merit for a board is simply the coefficient obtained
00490 as the output from the final table.
00500
00510 The following three advantages emerge from this method of
00520 training and evaluation.
00530 1) Essentially arbitrary inter-relationships between the input
00540 terms are taken in account by any one table. The only loss of
00550 accuracy is in the quantization.
00560 2) The training is a very simple process of accumulating
00570 counts. The training samples are introduced sequentially, and hence
00580 simultaneous storage of all the samples is not required.
00590 3) The process linearizes the storage requirements in the
00600 parameter space. In the case shown this requires only 343 entries
00610 instead of the 105↑9 entries were the entire space to be represented.
00620
00630 Before investigating the conditions under which the decomposition
00640 of a multidimensional space is valid it will simplify the explanation
00650 if we first consider a specific example in qualitative terms only.
00660
00670 Consider the simple table arrangement as shown in Fig.2 where
00680 we wish to take account of 5 input parameters, each requiring 3 bits
00690 for its specification. Were we to do all this in one table it would
00700 require 2↑15 or 32,768 entries, instead of the 1024 entries required
00710 for the two level arrangement shown. What we require as the output
00720 from the first level table is some function which represents the
00730 contribution made by the inputs to the first table so that the output
00740 from the second table truely represents the conditional probability
00750 that some specific training class has been represented by the specific
00760 inputs. In accordance with the usual practice we want to determine
00770 P(CLASS|(A,B,C,D,E) )
00780 where A B C D and E represent a specific set of input parameters.
00790 Unfortunately this cannot be broken into two parts in any very simple
00800 way since the values of A B C D and E are not unrelated. During the
00810 training sequence we are given specific values from which it is
00820 easy to compute the inverse probability
00830 P( (A,B,C,D,E)|CLASS)
00840 and we can equally well compute two separate probabilities
00850 P( (A,B,C)|CLASS) and
00860 P( (D,E)|CLASS).
00870 What we want is
00880 P( (A,B,C)|(D,E,CLASS) )
00890 since we do not have space to store the simultaneous probability
00900 of all of the input parameters and we can compute this value from
00910 P( (A,B,C,D,E)|CLASS )=P( (A,B,C)|D,E,CLASS )*P( (D,E)|CLASS ).
00920 If we now focus our attention on the second table we will observe
00930 that the two outputs D and E partition this table into sections.
00940 In the expanded case where all entries were into one table there
00950 should be enough entries in these sections to allow for all possible
00960 combinations of values assignable to the variables A B and C. However
00970 since we are not finally interested in which particular points in
00980 A,B,C space are associated with each region but with the probabilities
00990 we need only allow enough space to represent these probabilities
01000 to the desired accuracy. The outputs stored in the first level
01010 table corresponding to those A,B,C values to be grouped togather
01020 must therefore contain the same value. What we are
01030 doing is to reduce the dimensionallity of the space represented by
01040 the second level table by 2 by grouping togather those points in
01050 A,B,C space which give the same overall probability, given the values
01060 of D and E which define the region. In order to be able to do this
01070 we must associate with each point in A,B,C space not only the
01080 value of P( (A,B,C)|CLASS ) but also a value which contains contributions
01090 made by all those combinations of D and E which are found to be
01100 associated with the values of A B and C which define this point.
01110 The rigorous solution to this problem proves to be extremely awkward
01120 to effect, but fortunately there is a relatively simple approximation
01130 which has been found to yield useful results.
01140 We can discribe the process as follows. Assuming that some
01150 training has already taken place and the tables have started to
01160 stablize. Whenever an input is encountered which belongs to the class
01170 defined by a particular table array the values of A B C D and
01180 E are determined. A count of 1 is added to the first field in the first
01190 level table defined by A B and C. The output value fron the third field
01200 of this table is read. It should be noted that this output has been
01210 computed on the basis of previous training. This output togather
01220 with the values of D and E locate an entry in the second level table
01230 and a count of 1 is added to the first field of the second
01240 level table as defined by this entry. A correction is then made
01250 to a second field of the first level table which depends upon the
01260 concurrent values of D and E as computed by a method yet to be defined.
01270 The values stored in the last or output column of this first level
01280 table are all recomputed to allow for this correction. This calculation
01290 involves a count of the total number of times this particular class
01300 has been encountered during the entire learning sequence and the total
01310 number of learning samples, so these counts must also be updated.
01320 Finally the final output figure in the last field of the second
01330 table is also recomputed. A discription of exact details of this
01340 updating process will be defined until after the detailed analysis
01350 which is given in the next section.
01360
01370
01380
01390
01400 caption for FIG.2
01410
01420 FIG. 2 Arrangement during training (simplified arrangement)
01430 Inputs A B and C are concatenated to form an address which
01440 points to an entry into table 1 consisting of several fields or
01450 columns. Counts are accumulated in column (a) of the number of times
01460 the class is found to coincide with the particular values of A B and
01470 C. An accumulation is made in column (b) of a particular function of
01480 inputs D and E which also coincide with the values of A B and C which
01490 locate the entry. Data in column (c) are computed on the basis of the
01500 data in columns (a) and (b) as explained elsewhere.
01510 Inputs f(A,B,C,D,E,) taken from column (c) of table 1 togather
01520 with D and E are similarly concatinated to form an address of an entry
01530 into table 2. Counts are accumulated in column (d) of the number of
01540 times the class is found to coincide with the values specifying the
01550 entry. Finally column (e) is computed as explained elsewhere and contains
01560 a reasonable approximation to the desired values of
01570 p( class|(A,B,C,D,E) ).